HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy

نویسندگان

  • Quan Zou
  • Qinghua Hu
  • Maozu Guo
  • Guohua Wang
چکیده

MOTIVATION Multiple sequence alignment (MSA) is important work, but bottlenecks arise in the massive MSA of homologous DNA or genome sequences. Most of the available state-of-the-art software tools cannot address large-scale datasets, or they run rather slowly. The similarity of homologous DNA sequences is often ignored. Lack of parallelization is still a challenge for MSA research. RESULTS We developed two software tools to address the DNA MSA problem. The first employed trie trees to accelerate the centre star MSA strategy. The expected time complexity was decreased to linear time from square time. To address large-scale data, parallelism was applied using the hadoop platform. Experiments demonstrated the performance of our proposed methods, including their running time, sum-of-pairs scores and scalability. Moreover, we supplied two massive DNA/RNA MSA datasets for further testing and research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Parallel Implementation of Centre Star Method in SMP Cluster for Multiple Sequence Alignment

Multiple Sequence Alignment is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with large number of input sequences. Hence, many approximation algorithms are suggested for the problem. Most of the bounded error approximation multiple sequence...

متن کامل

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

A comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences

More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...

متن کامل

PCMA: fast and accurate multiple sequence alignment based on profile consistency

UNLABELLED PCMA (profile consistency multiple sequence alignment) is a progressive multiple sequence alignment program that combines two different alignment strategies. Highly similar sequences are aligned in a fast way as in ClustalW, forming pre-aligned groups. The T-Coffee strategy is applied to align the relatively divergent groups based on profile-profile comparison and consistency. The sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 31 15  شماره 

صفحات  -

تاریخ انتشار 2015